Method and apparatus for converting a channel-based 3D audio signal to an HOA audio signal

ABSTRACT

A system for converting a channel-based 3D audio signal to a higher-order Ambisonics HOA audio signal, the channel-based 3D audio signal is transformed from time domain to frequency domain. A primary ambient decomposition is carried out for three-channel triplets of blocks of the domain channel-based 3D audio signal, wherein directional signals and ambient signals are provided for each triplet. From the directional signals directional information of a total directional signal for each triple is derived. That total directional signal is HOA encoded according to the derived directions, and ambient signals are HOA encoded according to channel positions. The HOA coefficients of the HOA encoded directional signal and the HOA coefficients of the HOA encoded ambient signal are superimposed in order to obtain a HOA coefficients signal for the channel-based 3D audio signal, followed by a transformation into time domain.

TECHNICAL FIELD

The invention relates to a method and to an apparatus for converting achannel-based 3D audio signal to an HOA audio signal using primaryambient decomposition.

BACKGROUND

With the emerging of different immersive audio technologies such aschannel-based approaches like Auro-3D [9] or NHK 22.2 [10] and higherorder Ambisonics (HOA), it is desirable to find a reasonable way ofconverting audio channels to HOA coefficients and vice versa. One of theadvantages of HOA is its rendering flexibility to arbitrary loudspeakersetups. On one hand it is simple to convert HOA coefficients to audiochannels by means of an HOA renderer using channel positions as speakerpositions. On the other hand, it could be argued that conversion ofaudio channels to HOA coefficients can be carried out by passing audiochannels to HOA encoding employing channel positions as directionalinformation.

SUMMARY OF INVENTION

However, audio channels are typically a mix of directional and ambientsound signals in order to meet a good compromise between audio imagesharpness for clear localisation of audio sources and spaciousness foran enhanced feeling of envelopment and/or spatial immersion. Therefore,it is more reasonable to extract directional signals inherent in audiochannels and corresponding directional information for HOA encoding. Inthis context, primary ambient decomposition (PAD) techniques can beemployed.

A problem to be solved by the invention is to provide an HOA audiosignal from a channel-based 3D audio signal. This problem is solved bythe method disclosed in claim 1. An apparatus that utilises this methodis disclosed in claim 2. Advantageous additional embodiments of theinvention are disclosed in the respective dependent claims.

The processing described below converts audio channels in 3D audio intoHOA by means of primary ambient decomposition. This conversion isperformed as follows:

-   -   Triangulation according to channel positions, so that audio        channels are divided into non-overlapping triangles with        three-channel positions as vertices;    -   Successive primary ambient decomposition for triplets in order        to derive directional and ambient signals in each triplet;    -   Deriving directional information of the total directional signal        for each triplet and HOA encoding the total directional signal        according to derived directions;    -   Ambient signals are encoded to HOA according to channel        positions;    -   Superimposing HOA coefficients corresponding to directional and        ambient signals in order to obtain the total HOA coefficients of        the input audio channels.

In principle, the inventive method is adapted for converting achannel-based 3D audio signal to a higher-order Ambisonics HOA audiosignal, said method including:

-   -   if said channel-based 3D audio signal is in time domain,        transforming said channel-based 3D audio signal from time domain        to frequency domain;    -   carrying out a primary ambient decomposition for three-channel        triplets of blocks of said frequency domain channel-based 3D        audio signal, wherein related directional signals and ambient        signals are provided for each triplet;    -   from said directional signals, deriving directional information        of a total directional signal for each triplet;    -   HOA encoding said total directional signal according to said        derived directions, and HOA encoding ambient signals according        to channel positions;    -   superimposing HOA coefficients of said HOA encoded directional        signal and HOA coefficients of said HOA encoded ambient signal        in order to obtain an HOA coefficients signal for said        channel-based 3D audio signal;    -   transforming said HOA coefficients signal to time domain.

In principle the inventive apparatus is adapted for converting achannel-based 3D audio signal to a higher-order Ambisonics HOA audiosignal, said apparatus including means adapted to:

-   -   if said channel-based 3D audio signal is in time domain,        transform said channel-based 3D audio signal from time domain to        frequency domain;    -   carry out a primary ambient decomposition for three-channel        triplets of blocks of said frequency domain channel-based 3D        audio signal, wherein related directional signals and ambient        signals are provided for each triplet;    -   from said directional signals, derive directional information of        a total directional signal for each triplet;    -   HOA encode said total directional signal according to said        derived directions, and HOA encode ambient signals according to        channel positions;    -   superimpose HOA coefficients of said HOA encoded directional        signal and HOA coefficients of said HOA encoded ambient signal        in order to obtain an HOA coefficients signal for said        channel-based 3D audio signal;    -   transform said HOA coefficients signal to time domain.

BRIEF DESCRIPTION OF DRAWINGS

Exemplary embodiments of the invention are described with reference tothe accompanying drawings, which show in:

FIG. 1 Triangulation of NHK 22 channels into 40 triangles;

FIG. 2 Converting triplet channel signals to HOA signals;

FIG. 3 Flow diagram for multi-channel primary-ambient decomposition;

FIG. 4 Panning angle ϕ₁₂[i] and reference angle ϕ_(R) for directiondetermination;

FIG. 5 Spherical coordinate system.

DESCRIPTION OF EMBODIMENTS

Even if not explicitly described, the following embodiments may beemployed in any combination or sub-combination.

A. System Description

The system is defined under an audio analysis and synthesis framework.That is, individual audio channels are transformed to the frequencydomain by means of an analysis filter bank such as FFT. After frequencydomain processing, signals are converted to the time domain via asynthesis filter bank such as IFFT. In order to avoid artefacts at blockboundaries, windowing and overlapping are performed during the analysis,while windowing and overlap-add are carried out during synthesis. In thesequel, the analysis process is denoted as T-F, while the synthesisprocess is denoted as F-T.

A.1 Triangulation

Given input channel positions in 3D space on a unit sphere,triangulation can be accomplished by means of a Delaunay triangulation[7] using the Quickhull algorithm [8], so that triplets consisting ofthree channels can be obtained. FIG. 1 shows the triangulation resultsfor NHK 22 channels, which comprises four levels, namely a bottom layerwith three channels, indicated by vertices 20 to 22, a middle layer withten channels 1 to 10, a height layer with eight channels 11 to 18, and atop layer with channel 19.

In case there are only three input audio channels, no triangulation iscarried out. In the following, the term ‘triplet’ is also used for suchthree audio channels.

A.2 Successive Primary-Ambient Decomposition PAD

PAD decomposes individual channel signals into directional and ambientcomponents by exploiting inter-channel correlation. It is assumed that adirectional signal is a correlated signal among channels, while ambientsignals are uncorrelated with each other and are also uncorrelated withdirectional signals. Accordingly, directional signals providelocalisation, while ambient signals deliver spatial impression.

For triplets, e.g. obtained from triangulation, PAD is carried outsuccessively. Different strategies can be employed to determine in whichorder the successive decomposition is carried out. One way is to decidethe decomposition order according to triplet powers. That means, atriplet with a higher total power is decomposed earlier than a tripletwith a lower total power, where the total power is the sum of threechannel powers belonging to a triplet.

Given the decomposition order, PAD is carried out for individualtriplets, which delivers directional and ambient signals of threechannels.

A.3 HOA Encoding

For each triplet, three directional signals are combined to a totaldirectional signal according to the principle of summing localisation,while the directions can be derived by means of panning laws. As aresult, the total directional signal is converted to HOA.

For ambient signals, channel positions serve as direction to convertambient signals to HOA. The addition of the HOA converted directionalsignal and the ambient signal forms the HOA signal for the consideredtriplet. Summing HOA signals of all triplets results in the HOA signalfor the input channel signals.

FIG. 2 illustrates the processing chain for three channels of a tripletwithin the analysis-synthesis framework. In the following sections,individual modules in FIG. 2 are explained in more detail. Three-channelPAD is used as generalisation of the approach in [2] in order to enterthe complex filter bank domain (i.e. complex spectra), and to get threechannels using a channel model in order to explicitly take into accountspatial cues like inter-channel phase and/or delay difference.

B. Three-Channel Primary-Ambient Decomposition

Let {x_(m)[k], 1≤m≤3} denote time-domain audio samples for a specifictriplet after triangulation. The primary-ambient decomposition in stepor stage 22 in FIG. 2 is carried out in the frequency domain downstreama time-to-frequency transform step or stage 21 using e.g. a short-timeFourier transform. The corresponding spectra are denoted as {X_(m)[k,i],1≤m≤3}, where k denotes the k-th audio signal block following thetransform and i is the frequency bin index. X_(m)[k,i] is the inputsignal in step 31 in FIG. 3. For notational simplicity, the block indexk is dropped in the sequel. Accordingly, the channel model is asfollows:X _(m)[i]=A _(m)[i]e ^(jθ) ^(m) ^([i]) S[i]+N _(m)[i],1≤m≤3,  (1)where A_(m)[i]e^(jθ) ^(m) ^([i])S[i] is the directional componentpresent in individual channels, and {N_(m)[i]} are uncorrelated ambientcomponents. That is,E{N _(m)[i]N _(n) ^(*) [i]}=σ_(m) ²[i]δ(m−n),E{N _(n)[i]S*[i]}=0,E{(A _(m)[i]e ^(jθ) ^(m) ^([i]) S[i])(A _(m)[i]e ^(−jθ) ^(m) ^([i])S*[i])}=A _(m) ²[i]P _(S)[i],  (2)where E{.} denotes statistical expectation, (.)* denotes conjugatecomplex, n denotes a channel and δ(.) is the discrete-time deltafunction. Accordingly, A_(m)[i]≥0 denotes a positive amplitude panninggain.

The model represented by equation (1) takes three different spatial cuesinto account, namely, inter-channel level difference indicated byA_(m)[i] and inter-channel delay/phase differences indicated byθ_(m)[i], where inter-channel delay differences can be interpreted asfrequency-dependent phase differences as shown in [4] and [6]. Note thatthe channel model presented in [2] only considers inter-channel leveldifferences.

Primary-ambient decomposition can be carried out in three steps:

-   -   Directional and ambient power estimation;    -   Linear spectral estimation based on minimum mean square error        principle;    -   Post-scaling of estimated spectra for power maintenance.

In the following, three-channel PAD is described for individual steps,employing the channel model of equation (1).

B.1 Directional and Ambient Power Estimation

According to the model assumptions in equation (2), signal powers forindividual channels can be evaluated in step 32 as

$\begin{matrix}{{P_{m}\lbrack i\rbrack} = {{E\left\{ \left| {X_{m}\lbrack i\rbrack} \right|^{2} \right\}} = {\underset{\underset{P_{S_{m}}{\lbrack i\rbrack}}{︸}}{{A_{m}^{2}\lbrack i\rbrack}{P_{S}\lbrack i\rbrack}} + {{\sigma_{m}^{2}\lbrack i\rbrack}.}}}} & (3)\end{matrix}$

And cross correlations between the m-th channel signal and the n-thchannel signal are determined in step 32 asc _(mn)[i]=E{X _(m)[i]X _(n) ^(*) [i]}=A _(m)[i]A _(n)[i]e ^(j(θ) ^(m)^([i]-θ) ^(n) ^([i])) P _(S)[i],m≠n.  (4)

Without loss of generality, the n-th channel is defined as referencechannel with θ_(n)[i]≡0 and A_(m)[i]≡1. Therefore, A_(m)[i] and θ_(m)[i]are relative to the n-th channel. Consequently,c _(mn)[i]=E{X _(m)[i]X _(n) ^(*) [i]}=A _(m)[i]e ^(jθ) ^(m) ^([i]) P_(S)[i],m≠n.  (5)

The advantage of introducing a reference channel is to avoid an explicitgain and angle estimation for individual channels, which will becomeclear during the derivation process. Signal powers and crosscorrelations can empirically be estimated either by a moving average orby recursion using a forgetting factor as follows:

$\begin{matrix}{{{{\hat{P}}_{m}\left\lbrack {k,i} \right\rbrack} = {\left. {\frac{1}{K}\sum\limits_{q = 0}^{K - 1}} \middle| {X_{m}\left\lbrack {{k - q},i} \right\rbrack} \middle| {}_{2}{{\hat{P}}_{m}\left\lbrack {k,i} \right\rbrack} \right. = \left. \lambda \middle| {X_{m}\left\lbrack {k,i} \right\rbrack} \middle| {}_{2}{{+ \left( {1 - \lambda} \right)}{{\hat{P}}_{m}\left\lbrack {{k - 1},i} \right\rbrack}} \right.}},{{{\hat{c}}_{mn}\left\lbrack {k,i} \right\rbrack} = {\frac{1}{K}{\sum\limits_{q = 0}^{K - 1}{{X_{m}\left\lbrack {{k - q},i} \right\rbrack}{X_{n}^{*}\left\lbrack {{k - q},i} \right\rbrack}}}}},{{{\hat{c}}_{mn}\left\lbrack {k,i} \right\rbrack} = {{\lambda\left( {{X_{m}\left\lbrack {k,i} \right\rbrack}{X_{n}^{*}\left\lbrack {k,i} \right\rbrack}} \right)} + {\left( {1 - \lambda} \right){{{\hat{c}}_{mn}\left\lbrack {{k - 1},i} \right\rbrack}.}}}}} & (6)\end{matrix}$

For simplicity, instead of {circumflex over (P)}_(m)[.] and ĉ_(mn)[.],P_(m)[.] and c_(mn)[.] will be used in the sequel as estimated signalpowers and cross correlations.

The directional signal power P_(S) _(m) [i] is resolved in step 33 bymeans of c_(mn)[i]:

$\begin{matrix}{{{P_{S_{m}}\lbrack i\rbrack} = \frac{\left| {c_{{mn}_{1}}\lbrack i\rbrack}||{c_{{mn}_{2}}\lbrack i\rbrack} \right|}{\left| {c_{n_{1}n_{2}}\lbrack i\rbrack} \right|}},{m \neq n_{1}},{m \neq n_{2}},{n_{1} \neq n_{2}},{1 \leq m},n_{1},{n_{2} \leq 3},} & (7)\end{matrix}$and the ambient power is estimated by inserting equation (7) intoequation (3) as

$\begin{matrix}{{{\sigma_{m}^{2}\lbrack i\rbrack} = {{P_{m}\lbrack i\rbrack} - \frac{\left| {c_{{mn}_{1}}\lbrack i\rbrack}||{c_{{mn}_{2}}\lbrack i\rbrack} \right|}{\left| {c_{n_{1}n_{2}}\lbrack i\rbrack} \right|}}},} & (8)\end{matrix}$wherein c_(n) ₁ _(n) ₂ [i] is the cross correlation for the i-thfrequency bin between the n₁-th channel and the n₂-th channel, seeequation (4).

The problem associated with using the cross correlation ratio forestimating P_(S) _(m) [i] of equation (7) is that it cannot beguaranteed that the estimated ambient power in equation (8) isnon-negative. Therefore, the estimated directional power in equation (7)is post-processed in step 34, such that the estimated directional power,denoted as P_(S) _(m) ⁽¹⁾[i], is (i) less than P_(m)[i] for sure and(ii) approaching P_(S) _(m) [i] as far as possible.

If the estimated channel signal power P_(m)[i] is greater than or equalto the estimated directional signal power P_(S) _(m) [i], i.e.P_(m)[i]≥P_(S) _(m) [i], P_(S) _(m) ⁽¹⁾[i] is set to P_(S) _(m) [i].

If the estimated channel signal power P_(m)[i] is smaller than theestimated directional signal power P_(S) _(m) [i], i.e. P_(m)[i]<P_(S)_(m) [i], a function for limiting P_(S) _(m) [i] can be

$\begin{matrix}{{{P_{S_{m}}^{(1)}\lbrack i\rbrack} = {\beta\;{P_{m}\lbrack i\rbrack}\left( {1 - e^{{- \alpha}\frac{P_{S_{m}}{\lbrack i\rbrack}}{P_{m}{\lbrack i\rbrack}}}} \right)}},} & (9)\end{matrix}$which increases by ratio

$\frac{P_{S_{m}}\lbrack i\rbrack}{P_{m}\lbrack i\rbrack}$and is limited to βP_(m)[i]. Parameter β is a positive value near ‘1’,e.g. β=0.99. Parameter α controls how fast P_(S) _(m) ⁽¹⁾[i] approachesβP_(m)[i], e.g. α=1.3. When employing the post-processed directionalsignal power, a non-negative ambient power can always be guaranteed.

Setting P_(S) _(m) ⁽¹⁾[i]=P_(m)[i] for the P_(m)[i]>P_(S) _(m) [i] casewill result in ambient powers equal to zero, which however causesaudible artefacts in experiments.

In summary, bin-wise directional and ambient power estimation is carriedout in step 31-34 as follows:

-   -   Evaluate spectra of individual channels by a time-frequency        transform such as short-time Fourier transform in order to get        {X_(m)[i],1≤m≤M};    -   Estimate signal powers and inter-channel cross correlations as        {P_(m)[i]} and {c_(mn)[i]}, see equation (6);    -   Estimate directional signal powers {P_(S) _(m) [i]} according to        equation (7);    -   Post-process estimated directional signal powers like in        equation (9) in order to guarantee that (i) the estimated        ambient powers are non-negative and (ii) the post-processed        estimated directional signal powers well approximate the        originally estimated ones in equation (7);    -   Estimate ambient powers based on post-processed estimated        directional powers as σ_(m) ²[i]=P_(m)[i]−P_(S) _(m) ⁽¹⁾[i].

For notational simplicity, P_(S) _(m) [i] instead of P_(S) _(m) [i] isused as post-processed directional powers in the following.

B.1.1 Band-Wise Evaluation

Based on bin-wise estimation results, band-wise counterparts can also beevaluated, where frequency bins are divided into bands like criticalbands or equivalent rectangular bandwidth bands. The intention is on theone hand the computational efficiency with band-wise evaluation, and onthe other hand averaging in band-wise evaluation may reduce estimationerrors associated with bin-wise evaluation.

Let the bin index range for the b-th frequency band be [b_(l),b_(u)].Band signal power and band-wise inter-channel cross correlation can bedefined, similarly as in [3]:P _(m,b)=Σ_(i=b) _(l) ^(b) ^(u) P _(m)[i],c _(mn,b)=Σ_(i=b) _(l) ^(b)^(u) c _(mn)[i].  (10)

Similarly, directional and ambient band powers can be defined asP _(S) _(m) _(,b)=Σ_(i=b) _(l) ^(b) ^(u) P _(S) _(m) [i],σ_(m,b) ² =P_(m,b) −P _(S) _(m) ^(,b)=Σ_(i=b) _(l) ^(b) ^(u) σ_(m) ²[i].  (11)

B.2 Spectral Linear Minimum Mean Square Error (LMMSE) Estimation

B.2.1 Directional Signal

Linear spectral estimation for the directional signal in the referencechannel based on input channels reads Ŝ[i]=Σ_(m=1) ^(M)w_(S) _(m)[i]X_(m)[i], and the estimation error signal becomese _(S)[i]=Ŝ[i]−S[i]=(Σ_(m=1) ^(M) w _(S) _(m) [i]A _(m)[i]e ^(jθ) ^(m)^([i])−1)S[i]+Σ_(m=1) ^(M) w _(S) _(m) [i]N _(m)[i].

The linear estimation coefficients can be evaluated based on theprinciple of orthogonality in order to minimise the mean squared errorE{|e_(S)[i]|²}. It can be shown that

$\begin{matrix}{{{w_{s_{n}}\lbrack i\rbrack} = \frac{{PAR}_{n}\lbrack i\rbrack}{{R_{s}\lbrack i\rbrack} + 1}},{{w_{s_{m}}\lbrack i\rbrack} = {{\frac{{c_{nm}\lbrack i\rbrack}\text{/}{\sigma_{m}^{2}\lbrack i\rbrack}}{{R_{s}\lbrack i\rbrack} + 1}\mspace{14mu}{for}\mspace{14mu} m} \neq n}},} & (12)\end{matrix}$where the primary-to-ambient ratio (PAR) can be defined for individualchannels and for each frequency bin as PAR_(m)[i]=P_(S) _(m) [i]/σ_(m)²[i] and the sum of PARs is defined as R_(s)[i]=Σ_(m=1) ^(M)PAR_(m)[i].

Alternatively, band-wise estimation coefficients can be evaluated basedon band-wise evaluated primary, ambient powers and cross correlations:

$\begin{matrix}{{w_{s_{n},b} = \frac{{PAR}_{n,b}}{R_{s,b} + 1}},{w_{s_{m},b} = \frac{c_{{nm},b}\text{/}\sigma_{m,b}^{2}}{R_{s,b} + 1}},{m \neq n}} & (13)\end{matrix}$by defining band-wise PARs as PAR_(m,b)=P_(S) _(m) _(,b)/σ_(m,b) ² andthe sum of band-wise PARs as R_(s,b)=Σ_(m=1) ^(M) PAR_(m,b) in step 36.Accordingly, band-wise spectral estimation of the directional signalfrom the reference channel based on band-wise coefficients leads in step37 toŜ _(b)[i]=Σ_(m=1) ^(M) w _(S) _(m) _(,b) X _(m)[i], for i∈[b _(l) ,b_(u)].  (14)

That is, for bins in the same frequency band the coefficients forspectral estimation are same.

Given Ŝ[i], directional signals in other channels can be evaluated as

$\begin{matrix}{{{{\hat{S}}_{m}\lbrack i\rbrack} = {{{A_{m}\lbrack i\rbrack}e^{{j\;\theta_{m}}|i|}{\hat{S}\lbrack i\rbrack}} = {\frac{c_{mn}\lbrack i\rbrack}{P_{S}\lbrack i\rbrack}{\hat{S}\lbrack i\rbrack}}}},{m \neq n}} & (15)\end{matrix}$according to equation (5). Their band-wise counterparts are evaluated instep 37 as

$\begin{matrix}{{{{\hat{S}}_{m,b}\lbrack i\rbrack} = {\frac{c_{{mn},b}}{P_{S,b}}{{\hat{S}}_{b}\lbrack i\rbrack}}},{{{for}\mspace{14mu} i} \in \left\lbrack {b_{l},b_{u}} \right\rbrack},{m \neq {n.}}} & (16)\end{matrix}$

It is obvious that all estimates solely depend on estimated powers andinter-channel cross correlation, while no explicit estimation of gainsand angles like A_(m)[i] and θ_(m)[i] is necessary.

B.2.2 Ambient Signals

Linear spectral estimation for ambient signals is{circumflex over (N)} _(m′)[i]=Σ_(m=1) ^(M) w _(N) _(m′) _(,m)[i]X_(m)[i].

And the estimation coefficients minimising the mean square estimationerror become

$\begin{matrix}{{{w_{N_{m^{\prime},}m^{\prime}}\lbrack i\rbrack} = \frac{1 + {R_{s}\lbrack i\rbrack} - {{PAR}_{m^{\prime}}\lbrack i\rbrack}}{{R_{s}\lbrack i\rbrack} + 1}},{{w_{N_{m^{\prime}},m}\lbrack i\rbrack} = \frac{{- {c_{m^{\prime}m}\lbrack i\rbrack}}\text{/}{\sigma_{m}^{2}\lbrack i\rbrack}}{{R_{s}\lbrack i\rbrack} + 1}},{m \neq {m^{\prime}.}}} & (17)\end{matrix}$

Similarly as before, band-wise weights can be evaluated as

$\begin{matrix}{{w_{N_{m^{\prime}},m^{\prime},b} = \frac{1 + R_{s,b} - {PAR}_{m^{\prime},b}}{R_{s,b} + 1}},{w_{N_{m^{\prime}},m,b} = \frac{{- c_{{m^{\prime}m},b}}\text{/}\sigma_{m,b}^{2}}{R_{s,b} + 1}},{m \neq {m^{\prime}.}}} & (18)\end{matrix}$

And ambient spectral estimation based on band-wise coefficients iscarried out in step 37 as{circumflex over (N)} _(m′,b)[i]=Σ_(m=1) ^(M) w _(N) _(m′) _(,m,b) X[i],for i∈[b _(l) ,b _(u)]  (19)

Again, all estimates only depend on estimated powers and inter-channelcross correlations, while no explicit estimation of gains and angles forindividual channels is necessary.

B.3 Post-Scaling

To maintain directional and ambient powers before and afterdecomposition, a post-scaling is performed in step 38. The directionalpower from the reference channel after linear spectral estimation isevaluated by

$\begin{matrix}{{P_{\hat{S}}\lbrack i\rbrack} = {{E\left\{ {{\hat{S}\lbrack i\rbrack}{{\hat{S}}^{*}\lbrack i\rbrack}} \right\}} = {\frac{R_{s}\lbrack i\rbrack}{{R_{s}\lbrack i\rbrack} + 1}{{P_{S}\lbrack i\rbrack}.}}}} & (20)\end{matrix}$

The ambient power after linear spectral estimation is determined as

$\begin{matrix}{{{P_{{\hat{N}}_{m}}\lbrack i\rbrack} = {\left( {1 - \frac{{PAR}_{m}\lbrack i\rbrack}{1 + {R_{s}\lbrack i\rbrack}}} \right){\sigma_{m}^{2}\lbrack i\rbrack}}},{1 \leq m \leq {M.}}} & (21)\end{matrix}$

According to equations (20) and (21), directional and ambient powersstatistically are actually attenuated due to linear spectral estimation.To undo this attenuation, post-scaling is carried out as

$\begin{matrix}{{{\lbrack i\rbrack} = {{\sqrt{\frac{P_{S}\lbrack i\rbrack}{P_{\hat{S}}\lbrack i\rbrack}}{\hat{S}\lbrack i\rbrack}} = {\sqrt{\frac{{R_{s}\lbrack i\rbrack} + 1}{R_{s}\lbrack i\rbrack}}{\hat{S}\lbrack i\rbrack}}}},{{{\hat{S}}_{m}^{\prime}\lbrack i\rbrack} = {\frac{c_{mn}\lbrack i\rbrack}{P_{S}\lbrack i\rbrack}{\lbrack i\rbrack}}},{m \neq n},{{{\hat{N}}_{m}^{\prime}\lbrack i\rbrack} = {{\sqrt{\frac{\sigma_{m}^{2}\lbrack i\rbrack}{P_{{\hat{N}}_{m}}\lbrack i\rbrack}}{{\hat{N}}_{m}\lbrack i\rbrack}} = {\sqrt{\frac{1 + {R_{s}\lbrack i\rbrack}}{1 + {R_{s}\lbrack i\rbrack} - {{PAR}_{m}\lbrack i\rbrack}}}{{{\hat{N}}_{m}\lbrack i\rbrack}.}}}}} & (22)\end{matrix}$

If band-wise estimation coefficients are used for the spectralestimation, band-wise powers can be defined by

$\begin{matrix}{{P_{\hat{S},b} = {\frac{R_{s,b}}{R_{s,b} + 1}P_{S,b}}},{P_{{\hat{N}}_{m},b} = {\left( {1 - \frac{{PAR}_{m,b}}{1 + R_{s,b}}} \right)\sigma_{m,b}^{2}}},} & (23)\end{matrix}$and the post-scaling is performed for i∈[b_(l),b_(u)] by

$\begin{matrix}{\mspace{79mu}{{{{\hat{S}}_{b}^{\prime}\lbrack i\rbrack} = {{\sqrt{\frac{P_{S,b}}{P_{\hat{S},b}}}{{\hat{S}}_{b}\lbrack i\rbrack}} = {\sqrt{\frac{R_{s,b} + 1}{R_{s,b}}}{{\hat{S}}_{b}\lbrack i\rbrack}}}},\mspace{79mu}{{{\hat{S}}_{m,b}^{\prime}\lbrack i\rbrack} = {\frac{c_{{mn},b}}{P_{S,b}}{{\hat{S}}_{b}^{\prime}\lbrack i\rbrack}}},{m \neq n},{{{\hat{N}}_{m,b}^{\prime}\lbrack i\rbrack} = {{\sqrt{\frac{P_{{\hat{N}}_{m},b}}{\sigma_{m,b}^{2}}}{{\hat{N}}_{m,b}\lbrack i\rbrack}} = {\sqrt{\frac{1 + R_{s,b}}{1 + R_{s,b} - {PAR}_{m,b}}}{{{\hat{N}}_{m,b}\lbrack i\rbrack}.}}}}}} & (24)\end{matrix}$

The flow chart in FIG. 3 illustrates the multi-channel primary-ambientdecomposition employing band-wise coefficients for linear spectralestimation and post-scaling. A related block diagram employing bin-wisecoefficients looks correspondingly, which is clear according to thederivation process.

C. Directional Signal and Directional Information

Given estimated directional signals from individual channels{Ŝ′_(m)[i],1≤m≤3}, a total directional signal and its direction can bederived, which can be used for HOA encoding and rendering. This is theinverse problem to reproduction of directional sound via loudspeakers,where individual feeds for loudspeakers are derived from a directionalsignal. For loudspeakers located in the horizontal plane, a tangentpanning law is known, see [5] and [2]. For three-dimensional panning,vector based amplitude panning (VBAP) can be applied, cf. [5], or itsgeneralisation can be applied, cf. [1].

In the following, it is shown how to derive the total directional signalby applying the principle of VBAP, while the principle shown in [1] canbe employed similarly.

C.1 Horizontal Plane Case

A three-channel case as depicted in FIG. 4 is considered, where threechannels are located on the horizontal plane. Without loss ofgenerality, the first channel serves as reference channel. Afterdecomposition, directional signals are estimated asŜ′₁[i],Ŝ′₂[i],Ŝ′₃[i].

A total directional signal can be derived by two successive steps.First, a directional signal located between the first and secondchannels is determined, which is denoted as S₁₂[i]. After that, S₁₂[i]is combined with Ŝ′₃[i] in order to derive the total directional signal.Based on the estimated directional powers P_(S) ₁ [i] and P_(S) ₂ [i], apanning angle for the first and second channels can be determined bymeans of the tangent law according to [5] and [2]:

$\begin{matrix}{{{\xi_{12}\lbrack i\rbrack} = {\tan^{- 1}\left( {{\tan\left( \phi_{R} \right)}\frac{\sqrt{P_{S_{1}}\lbrack i\rbrack} - \sqrt{P_{S_{2}}\lbrack i\rbrack}}{\sqrt{P_{S_{1}}\lbrack i\rbrack} + \sqrt{P_{S_{2}}\lbrack i\rbrack}}} \right)}},} & (25)\end{matrix}$where

$\phi_{R} = {{\phi_{1} - {\frac{1}{2}\left( {\phi_{1} + \phi_{2}} \right)}} \in {\left\lbrack {0,\frac{\pi}{2}} \right\rbrack.}}$ϕ₁ and ϕ₂ denote azimuth angles for the first and second loudspeakers,respectively. For P_(S) ₁ [i]>>P_(S) ₂ [i], ξ₁₂[i]→ϕ_(R), and for P_(S)₂ [i]>>ξ₁₂[i]→−ϕ_(R). The directional signal S₁₂[i] and its directionare then given as

$\begin{matrix}{{{S_{12}\lbrack i\rbrack} = {\sqrt{1 + \frac{P_{S_{2}}\lbrack i\rbrack}{P_{S_{1}}\lbrack i\rbrack}}{\lbrack i\rbrack}}},{{\phi_{12}\lbrack i\rbrack} = {{\xi_{12}\lbrack i\rbrack} + {\frac{\phi_{1} + \phi_{2}}{2}.}}}} & (26)\end{matrix}$

Similarly, S₁₂[i] is combined with Ŝ′₃[i] to derive the totaldirectional signal and its direction. The panning angle is determined as

$\begin{matrix}{{{\xi_{123}\lbrack i\rbrack} = {\tan^{- 1}\left( {{\tan\left( {\phi_{R,3}\lbrack i\rbrack} \right)}\frac{\sqrt{{P_{S_{1}}\lbrack i\rbrack} + {P_{S_{2}}\lbrack i\rbrack}} - \sqrt{P_{S_{3}}\lbrack i\rbrack}}{\sqrt{{P_{S_{1}}\lbrack i\rbrack} + {P_{S_{2}}\lbrack i\rbrack}} + \sqrt{P_{S_{3}}\lbrack i\rbrack}}} \right)}},} & (27)\end{matrix}$where bin-wise reference angles ϕ_(R,3)[i]=½(ϕ₁₂[i]−ϕ₃) with ϕ₃ denotethe azimuth angle corresponding to the third loudspeaker. Consequently,the final directional signal and its direction are obtained as

$\begin{matrix}{{{S_{123}\lbrack i\rbrack} = {\sqrt{1 + \frac{P_{S_{2}}\lbrack i\rbrack}{P_{S_{1}}\lbrack i\rbrack} + \frac{P_{S_{3}}\lbrack i\rbrack}{P_{S_{1}}\lbrack i\rbrack}}{\lbrack i\rbrack}}},{{\phi_{123}\lbrack i\rbrack} = {{\xi_{123}\lbrack i\rbrack} + {\frac{{\phi_{12}\lbrack i\rbrack} + \phi_{3}}{2}.}}}} & (28)\end{matrix}$

This successive approach for evaluating panning angles and the directionof the total directional signal can be applied for multi-channel caseswith more than three channels, if directions of multi-channel signalsare all on the horizontal plane.

C.2 Three-Dimensional Case

In the three-channel case, with channel positions now located on a unitsphere, channel positions can be represented by a unit vector withCartesian coordinates as its elements, denoted as p₁, p₂, and p₃. Thebin-wise position (direction) of the total directional signal on theunit sphere can be determined as

$\begin{matrix}{{p\lbrack i\rbrack} = {\frac{1}{\sqrt{{P_{S_{1}}\lbrack i\rbrack} + {P_{S_{2}}\lbrack i\rbrack} + {P_{S_{3}}\lbrack i\rbrack}}}{\left( {{p_{1}\sqrt{P_{S_{1}}\lbrack i\rbrack}} + {p_{2}\sqrt{P_{S_{2}}\lbrack i\rbrack}} + {p_{3}\sqrt{P_{S_{3}}\lbrack i\rbrack}}} \right).}}} & (29)\end{matrix}$

That is, the direction determination of the total directional signal forthree-channel cases is the inverse problem of VBAP. For two channelsthat are not located on the horizontal plane, the direction cansimilarly be determined as

$\begin{matrix}{{p\lbrack i\rbrack} = {\frac{1}{\sqrt{{P_{S_{1}}\lbrack i\rbrack} + {P_{S_{2}}\lbrack i\rbrack}}}{\left( {{p_{1}\sqrt{P_{S_{1}}\lbrack i\rbrack}} + {p_{2}\sqrt{P_{S_{2}}\lbrack i\rbrack}}} \right).}}} & (30)\end{matrix}$

Therefore, for cases with more than three channels, equations (28) and(29) can be applied successively for determining the direction of thetotal directional signal. In an example with four channels with p₁, p₂,p₃ and p₄ as channel position vectors, the direction evaluation can beaccomplished in two steps. Firstly, the direction summarising firstthree directional signals from first three channels can be determined as

$\begin{matrix}{{p_{123}\lbrack i\rbrack} = {\frac{1}{\sqrt{{P_{S_{1}}\lbrack i\rbrack} + {P_{S_{2}}\lbrack i\rbrack} + {P_{S_{3}}\lbrack i\rbrack}}}\left( {{p_{1}\sqrt{P_{S_{1}}\lbrack i\rbrack}} + {p_{2}\sqrt{P_{S_{2}}\lbrack i\rbrack}} + {p_{3}\sqrt{P_{S_{3}}\lbrack i\rbrack}}} \right)}} & (31)\end{matrix}$with the corresponding directional power P_(S) ₁₂₃ [i]=P_(S) ₁ [i]+P_(S)₂ [i]+P_(S) ₃ [i]. Next, the final direction summarising fourdirectional signals can be calculated by applying equation (30):

${{p\lbrack i\rbrack} = {\frac{1}{\sqrt{{P_{S_{123}}\lbrack i\rbrack} + {P_{S_{4}}\lbrack i\rbrack}}}\left( {{{p_{123}\lbrack i\rbrack}\sqrt{P_{S_{123}}\lbrack i\rbrack}} + {p_{4}\sqrt{P_{S_{4}}\lbrack i\rbrack}}} \right)}},$with the corresponding directional power as P_(S)[i]=P_(S) ₁ [i]+P_(S) ₂[i]+P_(S) ₃ [i]+P_(S) ₄ [i].

Replacing bin-wise estimates with their band-wise counterparts, thetotal directional signal and its direction can be determined similarly.

D. Conversion to HOA

Based on derived directional signal S₁₂₃[i] and its correspondingbin-wise directional information ϕ₁₂₃[i] for the horizontal plane caseor p₁₂₃[i] for the 3D case, HOA encoding in frequency domain can becarried out in step or stage 25 in FIG. 2 asb _(S)[i]=S ₁₂₃[i]y(Ω_(S)[i]),  (32)where Ω_(S)[i] denotes direction according to ϕ₁₂₃[i] or p₁₂₃[i] andy(Ω_(S)[i]) is the mode vector dependent on Ω_(S)[i], see section E. HOAbasics for its definition. For band-wise approaches, Ω_(S)[i] is thesame for all frequency bins within a same frequency band.

For ambient signals {{circumflex over (N)}′_(m)[i]}, HOA encoding iscarried out in step or stage 24 on FIG. 2 asb _(N,m)[i]={{circumflex over (N)}′ _(m)[i]}y(Ω_(m)),  (33)where Ω_(m) is the channel position of the m-th channel. Consequently,the frequency-domain HOA coefficients for the considered triplet can beevaluated in step or stage 27 asb[i]=b _(S)[i]+Σ_(m=1) ³ b _(N,m)[i].  (34)

Finally, combining all HOA coefficients from individual tripletscompletes the conversion from channel signals to HOA signals. Thefrequency domain HOA signal is then transformed back into the timedomain in step or stage 26.

E. HOA Basics

Higher Order Ambisonics (HOA) is based on the description of a soundfield within a compact area of interest, which is assumed to be free ofsound sources, cf. e.g. sections 12 Higher Order Ambisonics (HOA) andC.5 HOA Encoder in [13]. In that case the spatio-temporal behaviour ofthe sound pressure p(t,x) at time t and position {circumflex over (Ω)}within the area of interest is physically fully determined by thehomogeneous wave equation. In the following a spherical coordinatesystem as shown in FIG. 5 is assumed. In this coordinate system the xaxis points to the frontal position, the y axis points to the left, andthe z axis points to the top. A position in space {circumflex over(Ω)}=(r,θ,ϕ)^(T) is represented by a radius r>0 (i.e. the distance tothe coordinate origin), an inclination angle θ∈ [0,π] measured from thepolar axis z and an azimuth angle ϕ∈ [0,2π] measured counter-clockwisein the x-y plane from the x axis. Further, (.)^(T) denotes thetransposition.

Then it can be shown [11] that the Fourier transform of the soundpressure with respect to time denoted by

_(t)(.), i.e. P(ω,{circumflex over (Ω)})=

_(t)(p(t,{circumflex over (Ω)}))=∫_(−∞) ^(∞)p(t,{circumflex over(Ω)})e^(−iωt)dt with ω denoting the angular frequency and i indicatingthe imaginary unit, can be expanded into a series of Spherical Harmonicsaccording toP(ω=kc _(s) ,r,θ,ϕ)=Σ_(n=0) ^(N)Σ_(m=−n) ^(n) A _(n) ^(m)(k)j _(n)(kr)Y_(n) ^(m)(θ,ϕ).

Here c_(s) denotes the speed of sound and k denotes the angular wavenumber, which is related to the angular frequency ω by

$k = {\frac{\omega}{c_{s}}.}$Further, j_(n)(.) denote the spherical Bessel functions of the firstkind and Y_(n) ^(m)(θ,ϕ) denote the real-valued Spherical Harmonics oforder n and degree m, which are defined below. The expansioncoefficients A_(n) ^(m)(k) only depend on the angular wave number k.Thereby it has been implicitly assumed that the sound pressure isspatially band-limited. Thus the series is truncated with respect to theorder index n at an upper limit N, which is called the order of the HOArepresentation.

If the sound field is represented by a superposition of an infinitenumber of harmonic plane waves of different angular frequencies ω andarriving from all possible directions specified by the angle tuple(θ,ϕ), it can be shown [12] that the respective plane wave complexamplitude function B(ω,θ,ϕ) can be expressed by the following SphericalHarmonics expansion B(ω=kc_(s),θ,ϕ)=Σ_(n=0) ^(N)Σ_(m=-n) ^(n)B_(n)^(m)(k)Y_(n) ^(m)(θ,ϕ), where the expansion coefficients B_(n) ^(m)(k)are related to the expansion coefficients A_(n) ^(m)(k) by A_(n)^(m)(k)=i^(n)B_(n) ^(m)(k).

Assuming that the individual coefficients B_(n) ^(m)(ω=kc_(s)) arefunctions of the angular frequency ω, the application of the inverseFourier transform (denoted by

⁻¹(.)) provides time domain functions

${b_{n}^{m}(t)} = {{\mathcal{F}_{t}^{- 1}\left( {B_{n}^{m}\left( {\omega\text{/}c_{s}} \right)} \right)} = {\frac{1}{2\pi}{\int_{- \infty}^{\infty}{{B_{n}^{m}\left( \frac{\omega}{c_{s}} \right)}e^{i\;\omega\; t}d\;\omega}}}}$for each order n and degree m, which can be collected in a single vectorb(t) byb(t)=[b ₀ ⁰(t)b ₁ ⁻¹(t)b ₁ ⁰(t)b ₁ ¹(t)b ₂ ⁻²(t)b ₂ ⁻¹(t)b ₂ ⁰(t)b ₂¹(t)b ₂ ²(t) . . . b _(N) ^(N-1)(t)]^(T)

The position index of a time domain function b_(n) ^(m)(t) within vectorb(t) is given by n(n+1)+1+m. The overall number of elements in vectorb(t) is given by O=(N+1)².

The final Ambisonics format provides the sampled version b(t) using asampling frequency f_(S) as{b(lT _(S))}_(l∈N) ={b(T _(S)),b(2T _(S)),b(3T _(S)),b(4T _(S)), . . .},where T_(S)=1/f_(S) denotes the sampling period. The elements ofb(lT_(S)) are here referred to as Ambisonics coefficients. The timedomain signals b_(n) ^(m)(t) and hence the Ambisonics coefficients arereal-valued.

E.1 Definition of Real Valued Spherical Harmonics

The real-valued spherical harmonics Y_(n) ^(m)(θ,ϕ) (assuming N3Dnormalisation) are given by

${Y_{n}^{m}\left( {\theta,\phi} \right)} = {\sqrt{\left( {{2n} + 1} \right)\frac{\left( \left. {n -} \middle| m \right| \right)!}{\left( \left. {n +} \middle| m \right| \right)!}}{P_{n,{|m|}}\left( {\cos\;\theta} \right)}\mspace{14mu}{{trg}_{m}(\phi)}}$${{with}\mspace{14mu}{{trg}_{m}(\phi)}} = \left\{ {\begin{matrix}{\sqrt{2}{\cos\left( {m\;\phi} \right)}} & {m > 0} \\1 & {m = 0} \\{{- \sqrt{2}}{\sin\left( {m\;\phi} \right)}} & {m < 0}\end{matrix}.} \right.$

The associated Legendre functions P_(n,m)(x) are defined as

${{P_{n,m}(x)} = {\left( {1 - x^{2}} \right)^{m\text{/}2}\frac{d^{m}}{{dx}^{m}}{P_{n}(x)}}},{m \geq 0}$with the Legendre polynomial P_(n)(x) and without the Condon-Shortleyphase term (−1)^(m).

E.2 Definition of the Mode Matrix

The mode matrix Ψ^((N) ¹ ^(,N) ² ⁶⁾ of order N₁ with respect to thedirections Ω_(q) ^((N) ² ⁾, q=1, . . . , O₂=(N₂+1)², related to order N₂is defined by Ψ^((N) ¹ ^(,N) ² ⁾:=[y₁ ^((N) ¹ ⁾ y₂ ^((N) ¹ ⁾ . . . y_(O)₂ ^((N) ¹ ⁾]∈

^(O) ¹ ^(×O) ² with y_(q) ^((N) ¹ ⁾:=[Y₀ ⁰ (Ω_(q) ^((N) ² ⁾) Y⁻¹⁻¹(Ω_(q) ^((N) ² ⁾) Y⁻¹ ⁰(Ω_(q) ^((N) ² ⁾) Y⁻¹ ¹(Ω_(q) ^((N) ² ⁾) Y⁻²⁻²(Ω_(q) ^((N) ² ⁾) Y⁻¹ ⁻²(Ω_(q) ^((N) ² ⁾) . . . Y_(N) ₁ ^(N) ¹ (Ω_(q)^((N) ² ⁾)]^(T)∈

^(O) ¹

denoting the mode vector of order N₁ with respect to the directionsΩ_(q) ^((N) ² ⁾, where O₁=(N₁+1)².

The described processing can be carried out by a single processor orelectronic circuit, or by several processors or electronic circuitsoperating in parallel and/or operating on different parts of thecomplete processing.

The instructions for operating the processor or the processors accordingto the described processing can be stored in one or more memories. Theat least one processor is configured to carry out these instructions.

REFERENCES

-   [1] A. Ando, K. Hamasaki, “Sound intensity-based three dimensional    panning”, Proceedings of the 126th AES Convention, Munich, May 2009-   [2] Ch. Faller, “Multiple-Loudspeaker Playback of Stereo    Signals”, J. Audio Eng. Soc. 54, vol. 2006, pp. 1051-1064-   [3] Ch. Faller, F. Baumgarte, “Binaural cue coding, part II: Schemes    and applications”, IEEE Transactions on Speech and Audio Processing    11, vol. 2003, pp. 520-531-   [4] [Merimaa et al. 2007] Merimaa, Juha; Goodwin, Michael M.; Jot,    Jean-Marc: Correlation-based ambience extraction from stereo    recordings. In: 123rd Convention of the Audio Eng. Soc. New York,    2007-   [5] V. Pulkki, “Virtual sound source positioning using vector base    amplitude panning”, J. Audio Eng. Soc. 45, vol. 1997, June, Nr.6,    pp. 456-466-   [6] J. Thompson, B. Smith, A. Warner, J.-M. Jot, “Direct-diffuse    decomposition of multichannel signals using a system of pairwise    correlations”, 123rd Convention of the Audio Eng. Soc., San    Francisco, 2012-   [7] B. Delaunay, “Sur la Sphère Vide”, Bulletin de l'academie des    sciences de l'URSS, 1934, vol. 1, pp. 793-800-   [8] C. B. Barber, D. P. Dobkin, H. Huhdanpaa, “The Quickhull    Algorithm for Convex Hulls”, CM Transactions on Mathematical    Software, 1996, vol. 22, pp. 469-483-   [9] http://www.barco.com/projection_systems/downloads/Auro-3D_v3.pdf-   [10] http://www.nhk.or.jp/strl/publica/bt/en/fe0045-6.pdf-   [11] E. G. Williams, “Fourier Acoustics”, 1999, vol. 93 of Applied    Mathematical Sciences, Academic Press-   [12] B. Rafaely, “Plane-wave Decomposition of the Sound Field on a    Sphere by Spherical Convolution”, J. Acoust. Soc. Am., 2004, vol.    4(116), pp. 2149-2157-   [13] ISO/IEC IS 23008-3

The invention claimed is:
 1. A method for converting a channel-based 3Daudio signal to a higher-order Ambisonics HOA audio signal, said methodincluding: if said channel-based 3D audio signal is in time domain,transforming said channel-based 3D audio signal from time domain tofrequency domain; carrying out a primary ambient decomposition forthree-channel triplets of blocks of said frequency domain channel-based3D audio signal, wherein related directional signals and ambient signalsare provided for each triplet, and wherein said primary ambientdecomposition includes a directional and ambient power estimation, alinear spectral estimation based on minimum mean square error principle,and a post-scaling of the estimated spectra such that power maintenanceis achieved; from said directional signals, deriving directionalinformation of a total directional signal for each triplet; HOA encodingsaid total directional signal according to said derived directions, andHOA encoding ambient signals according to channel positions;superimposing HOA coefficients of said HOA encoded directional signaland HOA coefficients of said HOA encoded ambient signal in order toobtain an HOA coefficients signal for said channel-based 3D audiosignal; transforming said HOA coefficients signal to time domain.
 2. Themethod of claim 1, wherein windowing and overlapping is carried out inconnection with said transform from time domain to frequency domain,while windowing and overlap-add is carried out in connection with saidtransform from frequency, domain to time domain.
 3. The method of claim1, wherein, in case there are more than three channels, a triangulationis performed in that channels of said channel-based 3D audio signal aredivided into non-overlapping triangles or triplets with three-channelpositions as vertices.
 4. The method of claim 3, wherein in case thechannel positions of said channel-based 3D audio signal are given in 3Dspace on a unit sphere, said triangulation is accomplished by means of aDelaunay triangulation using the Quickhull algorithm.
 5. The method ofclaim 1, wherein said primary ambient decomposition for said triplets iscarried out successively and the decomposition order is carried outaccording to triplet powers, such that a triplet with a higher totalpower is decomposed earlier than a triplet with a lower total power,wherein the total power is the sum of three channel powers belonging toa triplet.
 6. The method of claim 1, wherein based on the decompositionorder, said primary ambient decomposition is carried out for individualtriplets, thereby delivering directional and ambient signals of threechannels, and wherein three directional signals are combined to a totaldirectional signal according to the principle of summing localisation,while the directions are derived by means of panning laws.
 7. The methodof claim 1, wherein said primary ambient decomposition includes:calculating, for a block (X_(m)[i]) of multichannel spectral bins,signal powers P_(m) [i] and inter-channel cross correlations c_(mn)[i]between different channel signals, wherein 1≤m≤3 denotes a specifictriplet after triangulation, m,n denote two different channels and idenotes a frequency bin index; calculating a directional signal power${{P_{S_{m}}\lbrack i\rbrack} = \frac{\left| {c_{{mn}_{1}}\lbrack i\rbrack}||{c_{{mn}_{2}}\lbrack i\rbrack} \right|}{\left| {c_{n_{1}n_{2}}\lbrack i\rbrack} \right|}},$m≠n₁, m≠n₂, n₁≠n₂, 1≤m, n₁, n₂≤3, wherein c_(n) ₁ _(n) ₂ [i] is thecross correlation for the i-th frequency bin between channel n₁ andchannel n₂, which both are different from channel m; if calculated saidsignal power P_(m)[i] is smaller than directional power P_(S) _(m) [i],post-processing said directional power P_(S) _(m) [i] such that it isless than P_(m)[i] and approaches P_(S) _(m) [i] as far as possible;calculating a band signal power P_(m,b), a band-wise inter-channel crosscorrelation c_(mn,b), a directional band power P_(S) _(m) _(,b) and anambient band power σ_(m,b) ²=P_(m,b)−P_(S) _(m) _(,b), wherein b denotesa band; calculating a primary-to-ambient ratio PAR_(m)[i]=P_(S) _(m)[i]/σ_(m) ²[i] for each individual channel and their sum${{R_{s}\lbrack i\rbrack} = {\sum\limits_{m = 1}^{M}{P\; A\;{R_{m}\lbrack i\rbrack}}}},$or calculating a primary-to-ambient ratio PAR_(m,b)=P_(S) _(m)_(,b)/σ_(m,b) ² for each individual band and their sum${R_{s,b} = {\sum\limits_{m = 1}^{M}{P\; A\; R_{m,b}}}};$ estimatingdirectional and ambient signal spectra based on PAR_(m)[i] andc_(mn)[i], or based on PAR_(m,b) and c_(mn,b), respectively; scalingsaid estimated directional and ambient signal spectra such that anattenuation caused by said spectral estimation is reversed.
 8. Digitalaudio signal that is generated according to the method of claim
 1. 9. Anapparatus for converting a channel-based 3D audio signal to ahigher-order Ambisonics HOA audio signal, said apparatus including atleast a processor, wherein the at least processor includes: if saidchannel-based 3D audio signal is in time domain, a transform stageconfigured to transform said channel-based 3D audio signal from timedomain to frequency domain; a decomposition stage configured to carryout a primary ambient decomposition for three-channel triplets of blocksof said frequency domain channel-based 3D audio signal, wherein relateddirectional signals and ambient signals are provided for each triplet,and wherein said primary ambient decomposition includes a directionaland ambient power estimation, a linear spectral estimation based onminimum mean square error principle, and a post-scaling of the estimatedspectra such that power maintenance is achieved; and at least one otherstage configured to: derive, from said directional signals, directionalinformation of a total directional signal for each triplet; HOA encodesaid total directional signal according to said derived directions, andHOA encode ambient signals according to channel positions; superimposeHOA coefficients of said HOA encoded directional signal and HOAcoefficients of said HOA encoded ambient signal in order to obtain anHOA coefficients signal for said channel-based 3D audio signal; andtransform said HOA coefficients signal to time domain.
 10. The apparatusof claim 9, wherein the transform stage is configured to carry outwindowing and overlapping in connection with said transform from timedomain to frequency domain, and the at least one other stage isconfigured to carry out windowing and overlap-add in connection withsaid transform from frequency domain to time domain.
 11. The apparatusof claim 9, wherein, in case there are more than three channels, thedecomposition stage is configured to perform a triangulation in thatchannels of said channel-based 3D audio signal are divided intonon-overlapping triangles or triplets with three-channel positions asvertices.
 12. The apparatus of claim 11, wherein in case the channelpositions of said channel-based 3D audio signal are given in 3D space ona unit sphere, said triangulation is accomplished by means of a Delaunaytriangulation using the Quickhull algorithm.
 13. The apparatus of claim9, wherein the decomposition stage is configured to carry out saidprimary ambient decomposition for said triplets successively and thedecomposition order is carried out according to triplet powers, suchthat a triplet with a higher total power is decomposed earlier than atriplet with a lower total power, wherein the total power is the sum ofthree channel powers belonging to a triplet.
 14. The apparatus of claim9, wherein based on the decomposition order, the decomposition stage isconfigured to carry out said primary ambient decomposition forindividual triplets, thereby delivering directional and ambient signalsof three channels, and wherein three directional signals are combined toa total directional signal according to the principle of summinglocalisation, while the directions are derived by means of panning laws.15. The apparatus of claim 9, wherein said decomposition stage isconfigured to determine primary ambient decomposition including by:calculating, for a block (X_(m)[i]) of multichannel spectral bins;signal powers P_(m)[i] and inter-channel cross correlations c_(mn)[i]between different channel signals, wherein 1≤m≤3 denotes a specifictriplet after triangulation, m,n denote two different channels and idenotes a frequency bin index; calculating a directional signal power${{P_{S_{m}}\lbrack i\rbrack} = \frac{\left| {c_{{mn}_{1}}\lbrack i\rbrack}||{c_{{mn}_{2}}\lbrack i\rbrack} \right|}{\left| {c_{n_{1}n_{2}}\lbrack i\rbrack} \right|}},$m≠n₁, m≠n₂, n₁≠n₂, 1≤m, n₁, n₂≤3, wherein c_(n1n2)[i] is the crosscorrelation for the i-th frequency bin between channel n₁ and channeln₂, which both are different from channel m; if calculated said signalpower P_(m)[i] is smaller than directional power P_(S) _(m) [i],post-processing said directional power P_(S) _(m) [i] such that it isless than P_(m)[i] and approaches P_(S) _(m) [i] as far as possible;calculating a band signal power P_(m,b), a band-wise inter-channel crosscorrelation c_(mn,b), a directional band power P_(S) _(m) _(,b) and anambient band power σ_(m,b) ²=P_(m,b)−P_(S) _(m) _(,b), wherein b denotesa band; calculating a primary-to-ambient ratio PAR_(m)[i]=P_(S) _(m)[i]/σ_(m) ²[i] for each individual channel and their sum${{R_{s}\lbrack i\rbrack} = {\sum\limits_{m = 1}^{M}{P\; A\;{R_{m}\lbrack i\rbrack}}}},$or calculating a primary-to-ambient ratio PAR_(m,b)=P_(S) _(m)_(,b)/σ_(m,b) ² for each individual band and their sum${R_{s,b} = {\sum\limits_{m = 1}^{M}{P\; A\; R_{m,b}}}};$ estimatingdirectional and ambient signal spectra based on PAR_(m)[i] andc_(mn)[i], or based on PAR_(m,b) and c_(mn,b), respectively; scalingsaid estimated directional and ambient signal spectra such that anattenuation caused by said spectral estimation is reversed.